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Abstract 



Predictive distributions need to be aggregated when probabilistic forecasts are 
merged, or when expert opinions expressed in terms of probabihty distributions are 
fused. We take a prediction space approach that apphes to discrete, mixed discrete- 
QQ \ continuous and continuous predictive distributions ahke, and study combination formu- 

las for cumulative distribution functions from the perspectives of coherence, probabilis- 
^ . tic and conditional calibration, and dispersion. Both linear and non-linear aggregation 

C/^ \ methods are investigated, including generalized, spread-adjusted and beta-transformed 

(-^ ■ linear pools. The effects and techniques are demonstrated theoretically, in simulation 

examples, and in case studies on density forecasts for S&P 500 returns and daily max- 
imum temperature at Seattle- Tacoma Airport. 



Key words: calibration; coherent combination formula; density forecast; forecast ag- 
gregation; linear pool; probability integral transform. 



1 Introduction 

Probabilistic forecasts aim to provide calibrated and sharp predictive distributions for future 
quantities or events of interest. As they admit the assessment of forecast uncertainty and 
allow for optimal decision making, probabilistic forecasts continue to gain prominence in a 
wealth of applications, ranging from economics and finance to meteorology and climatology 
. (Gneiting 2008). The general goal is to maximize the sharpness of the predictive distributions 

subject to calibration (Murphy and Winkler 1987; Gneiting, Balabdaoui and Raftery 2007). 
For a real-valued outcome, a probabilistic forecast can be represented in the form of a pre- 
dictive cumulative distribution function, which might be discrete, mixed discrete-continuous 
or continuous, with the latter case corresponding to density forecasts. 

In many situations, complementary or competing probabilistic forecasts from dependent 
or independent information sources are available. For example, the individual forecasts might 
stem from distinct experts, organizations or statistical models. The prevalent method for ag- 
gregating the individual predictive distributions into a single combined forecast is the linear 
pool (Stone 1961). While other methods for combining predictive distributions are available 
(Genest and Zidek 1986; Clemen and Winkler 1999; 2007), the linear pool is typically the 
method of choice, with the pioneering work of Winkler (1968) and Zarnowitz (1969), and 
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recent papers by Mitchell and Hall (2005), Wallis (2005), Hall and Mitchell (2007), Jore, 
Mitchell and Vahey (2010), Kascha and Ravazzolo (2010) and Garratt et al. (2011) being ex- 
amples in the case of density forecasts. Similarly, linear pools have been applied successfully 
to combine discrete predictive distributions; for recent reviews, see Ranjan and Gneiting 
(2010) and Clements and Harvey (2011). 

Despite the ubiquitous success of the linear pool in a vast number of applications, frag- 
mented recent work points at potential shortcomings and limitations. Hora (2004) proved 
that any nontrivial convex combination of two calibrated density forecasts is uncalibrated. 
In the discrete case, Dawid, DeGroot and Mortera (1995) and Ranjan and Gneiting (2010) 
showed that linear combination formulas with strictly positive coefficients fail to be coherent 
and demonstrated potential improvement under nonlinear aggregation. 

Our goal here is to unify these results and to extend them in various directions. Towards 
this end, we develop novel theoretical approaches to studying combination formulas and 
aggregation methods, where we think of an aggregation method as a class of combination 
formulas. For example, the traditional linear pools comprises the linear combination formulas 
with nonnegative weights that sum to one. Technically, we operate in terms of cumulative 
distribution functions, which allows us, in contrast to earlier work, to provide a unified 
treatment of all real-valued predictands, including the cases of density forecasts, mixed 
discrete-continuous predictive distributions, probability mass functions for count data and 
probability forecasts of a dichotomous event, all of which are important in applications. 
The extant literature compares combination formulas by examining whether or not they 
possess certain analytic characteristics, such as the strong setwise function and external 
Bayes properties (Genest and Zidek 1986; French and Ri'os Insua 2000). Again in contrast 
to much of the earlier work, we assess combination formulas and aggregation methods from 
the perspectives of coherence, calibration and dispersion. 

Section |2] sets the stage by introducing the key tool of a prediction space, which is a 
probability space tailored to the study of forecasts and combination formulas. We revisit 
the work of Gneiting et al. (2007) and Ranjan and Gneiting (2010) in the prediction space 
setting and show, perhaps surprisingly, that if the outcome is binary, probabilistic calibration 
and conditional calibration are equivalent. Throughout the technical parts of the paper, 
information sets are represented by a-algebras, and a predictive distribution is considered 
ideal if it agrees with the conditional distribution of the predictand, given the information 
basis or a-algebra at hand. 

Section [3] is devoted to the study of specific, linear and non-linear combination formulas 
and aggregation methods. A key result is, roughly, that dispersion tends to increase under 
linear pooling. This helps explain the success of linear combination formulas in aggregating 
underdispersed component distributions, and leads us to unify and strengthen the afore- 
mentioned results of Dawid et al. (1995), Hora (2004) and Ranjan and Gneiting (2010). 
In particular, any linear combination formula with strictly positive coefficients fails to be 
coherent, and the traditional linear pool fails to be fiexibly dispersive. In view of these 
limitations, we investigate parsimonious nonlinear alternatives, including generalized linear 
pools (Dawid et al. 1995), the spread-adjusted linear pool, which has been used successfully 
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in the meteorological literature (Berrocal, Raftery and Gneiting 2007; Glahn et al. 2009), 
and the beta-transformed linear pool (Ranjan and Gneiting 2010), which we demonstrate to 
be flexibly dispersive. 

Section H] turns to a simulation study and data examples on density forecasts for daily 
maximum temperature at Seattle- Tacoma Airport and S&P 500 returns. The paper ends 
in Section [5l where we summarize our findings from applied and theoretical perspectives, 
suggest directions for future work and, in addition to discussing probabilistic forecasts, hint 
at the closely related problem of the fusion of expert judgements that are expressed in terms 
of probability distributions. 

2 Combination formulas and aggregation methods 

In a seminal paper. Murphy and Winkler (1987) proposed a general framework for the 
evaluation of point forecasts, which is based on the joint distribution of the forecast and 
the observation. Dawid et al. (1995) developed and used a related framework in studying 
multiple probability forecasts for a binary event. Here we respond to the call of Dawid et 
al. (1995, p. 288) for an extension, and we start with an informal sketch of a fully general 
approach, in which the observations take values in just any space. 

The most general setting considers the joint distribution of multiple probabilistic fore- 
casts and the observation on a probabihty space (fi,^, Q). More explicitly, we assume that 
the elements of the sample space Q can be identified with tupels of the form 

(Pi, ... , Pfc, Y), 

where each of Pi, . . . , is a probability measure on the outcome space of the observation, Y. 
For i = 1, . . . , A;, we require the random probability measure Pj to be measurable with respect 
to the sub-cr-algebra Ai ^ A that encodes the forecast's information set or information basis, 
consisting of data, expertise, theories and assumptions at hand. The probability measure Q 
on [Q, A) specifies the joint distribution of the probabilistic forecasts and the observation. 

In this setting, the probabilistic forecasts Pi, . . . , P^ might stem from distinct experts, or- 
ganizations or statistical models, as commonly encountered in the practice of forecasting. In 
aggregating them, the theoretically optimal strategy is to combine information sets, that is, 
to issue the conditional distribution of the observation Y given the cr-algebra <7{Ai, . . . , Ak) 
generated by the information sets Ai, . . . , Ak- However, as Dawid et al. (1995, p. 264) note, 

"this ideal will almost always be rendered unattainable, by the extent of the data, 
company confidentiality, or the inability of the experts to identify clearly the empirical 
basis and background knowledge leading to their intuitive opinions. " 

The best that we can hope for in practice is to find the conditional distribution of the obser- 
vation Y given the cr-algebra o"(Pi, . . . , P^) generated by the random probability measures 
Pi, . . . , Pfc. Of course, it is always true that 

(T(Pi,...,Pfc) C a{Au...,Ak), 
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and in most cases of practical interest the left-hand side constitutes a substantially lesser 
information basis than the right-hand side. 

2.1 Prediction spaces 

In what follows, we restrict the discussion to the case of a real-valued observation. A proba- 
bilistic forecast then corresponds to a Lebesgue-Stieltjes measure on the real line, M, which 
we identify with the associated right-continuous cumulative distribution function (CDF). We 
write a{Ai, . . . , Am) and cr(Xi, . . . , X„) to denote the a-algebra generated by the families 
Ai, . . . , Am of subsets of fl, and the random variables Xi, . . . , respectively. We use the 
symbol C generically to denote an unconditional or conditional law or distribution and follow 
standard conventions in identifying the sub-cr-algebras on which we condition. 

We now introduce the key tool of a prediction space, which is a probability space tailored 
to the study of combination formulas for real-valued outcomes, though we allow the case 
= 1 of a single probabilistic forecast. 

Definition 2.1. Let A; > 1 be an integer. A prediction space is a probability space {Q, A, Q) 
together with sub-a-algebras Ai, . . . ,Ak A, where the elements of the sample space fl can 
be identified with tupels (Fi, . . . , Fk,Y,V) such that 

(PI) for i = 1, . . . , /c, Fj is a CDF- valued random quantity that is measurable with respect 
to the sub- 0"- algebra Ai^ 

(P2) y is a real-valued random variable, 

(P3) is a random variable that is uniformly distributed on the unit interval and indepen- 
dent of . . . ,Ak and Y. 

All subsequent definitions and results are within the prediction space setting. Phrases 
such as almost surely or with positive probability refer to the probability measure Q on {Q, A) 
that determines the joint distribution of the probabilistic forecasts and the observations. 
While (PI) and (P2) formalize the predictive distributions and the observation, assumption 
(P3) is purely technical, allowing us to define a generalized version of the classical probabihty 
integral transform. The sub-cx-algebra Ai encodes the information set for the CDF-valued 
random quantity Fi which may, but need not, be ideal in the following sensed 

Definition 2.2. The CDF- valued random quantity is ideal relative to the sub-a-algebra 
Ai if Fi = C{Y\Ai) almost surely. 

In the subsequent examples we write Af{fi, a^) to denote a univariate normal distribution 
with mean /i and variance o"^, and similarly for the bivariate normal distribution. We use 
the symbols $ and to denote the standard normal cumulative distribution function and 
density function, respectively. 

^That is, {Fi{xj) G Bj for j — 1, . . . ,n} 6 A for all finite collections xi, . . . ,x„ of real numbers and 
Bi, . . . , Bn of Borel sets. 

^In a recent comment, Tsyplakov (2011) proposes the same terminology. 
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Table 1: Probabilistic forecasts in Examples 12.41 and 12.101 The observation Y is normal with 
mean fi and variance 1, where /i is standard normal. The random variable r attains the 
values —1 and 1 with probability |, independently of /i and Y. 



Forecast 


Predictive Distribution 


Perfect 


Fi 


= ^f{^i^) 


Climatological 




= AA(0,2) 


Unfocused 






Sign-reversed 


Fa 


= AA(-/i,l) 



Example 2.3 (probability forecasts of a binary event). Here we find a prediction space 
for the simulation example of Ranjan and Gneiting (2010), which considers a dichotomous 
outcome, Y . For technical consistency later on, we identify a success or occurence with the 
outcome F = 0, and a non-success with Y = 1. The CDF- valued random quantities Fj 
thus are of the form Fi{y) = Pil{y > 0) + (1 — Pi)i{y > 1) and can be identified with the 
random probability forecasts Pi for a success, for i = 1,2. Writing u = (wi, 1^2, 1^3, 1^4) for an 
elementary event, we define the probability forecasts 

Pi{uj) = ^l /-^^^ ) and p2{u) = <^l^^L=], 

where ai > and a2 > are fixed constants, the observation Y{u) = U3, and the auxiliary 
variable V{uj) = u^. To complete the specification of the prediction space, we let Q = 
M X R X {0, 1} X (0, 1) and let A = 84^ he the corresponding Borel-a-algebra, define Q to be 
the product of J\f{0,al), A/'(0, (j|) and standard uniform measures on the first, second and 
fourth coordinate projections, respectively, and let 

Q(5iXfi2X{0}x(0,l)) = ^ [ [ ^LOi + tU2)(p(—](p( — ]d\{LOi)d\{L02) 

^i(^2Jb^Jb2 V^l/ Vf^2/ 

for Borel sets -Bi, i?2 ^ I^, where A denotes the Lebesgue measure. Then pi is measurable with 
respect to the sub- a- algebra Ai = <j{oui), and p2 is measurable with respect to A2 = cr(a;2). 
Moreover, by the arguments in the appendix of Ranjan and Gneiting (2010), Fi is ideal 
relative to Ai, and F2 is ideal relative to A2- 

Typically, it suffices to consider the joint distribution of the tuple (Fi, . . . ,Fk,Y), without 
any need to explicitly specify other facets of the prediction space. 

Example 2.4 (density forecasts). To define a prediction space, let 

r I /i ~ A/'(/i, 1) where fi^Af{0,l), 
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Table 2: Some classes of fixed, non-random cumulative distribution functions, where the 
subscript refers to an interval ICR. In the case of Bernoulli measures, we identify a success 
with and a non-success with 1, so that the corresponding cumulative distribution function 
has jump discontinuities at these values, and otherwise is constant. 



Class Characterization of the Members 





support in I 










support in I 


strictly increasing on I 






Ci 


support in I 


continuous 






ct 


support in I 


continuous; strictly increasing on 


I 




support in I 


admits Lebesgue density 






vt 


support in I 


admits Lebesgue density; 


strictly 


increasing on I 


B 


Bernoulh measure 






B+ 


Bernoulh measure with nondegenerate 


success 


probability 



and let r attain the values 1 and —1 with equal probability, independently of /x and Y . Table 
[Uplaces the density forecasts in the simulation example of Gneiting, Balabdaoui and Raftery 
(2007) in this setting. The perfect forecast is ideal relative to the sub-cr-algebra generated by 
/i. The climatological forecast is ideal relative to the trivial sub-cr-algebra. The unfocused 
and sign-biased forecasts are not ideal, as we will see in Example 12.101 below. 

2.2 Combination formulas 

As noted, in aggregating predictive cumulative distribution functions, the theoretically op- 
timal strategy is to combine information sets, that is, to issue the conditional distribution 
of the observation Y given the cr-algebra cr(^i, . . . ,Ak) generated by the information sets 
^1, . . . , Ak- However, information aggregation often is not feasible in practice, when individ- 
ual sources of expertise reveal predictive distributions, rather than information sets. What 
we can realistically aim at is to model the conditional distribution of the observation Y given 
the (T-algebra generated by the predictive cumulative distribution functions, namely 

G = £(r|Fi,...,Ffc), 

where we define 

£ ( F I Fi, . . . , Ffc) = £ ( r I : ^ = 1, . . . , fc, X e Q), 

with Q being the set of the rational numbers. 

In practice one resorts to empirical combination formulas. Specifically, let J-" be a class of 
fixed, non-random cumulative distribution functions such that Fi, . . . , G J-" almost surely. 
For example, if we are concerned with density forecasts on on the real line M, we consider 
the class of the cumulative distribution functions that admit a Lebesgue density. Further 
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classes T of interest are listed in Table El A combination formula then is a mapping of the 
form 

G:T''= Tx- ■ -xj- ^ (Fi, . . . , Ffc) ^ . . . , Ffc). (1) 

A; times 

We allow the case k = 1, where the mapping may provide calibration and dispersion adjust- 
ments for a single predictive distribution, as described later on in Sections 13.41 and El 

Here our interest is in the case k >2, where Dawid et al. (1995) introduced the notion of 
a coherent combination formula in the context of probability forecasts of a binary event. We 
now define a more general notion that applies to probabilistic forecasts of general, real- valued 
quantities. 

Definition 2.5. Suppose that k > 2. The combination formula G : J-''^ — ?■ J-" is coher- 
ent relative to the class J-" if there exists a prediction space {Q, A, Q) with sub- a- algebras 
Al, . . . , Ak A and CDF- valued random quantities Fi, . . . ,Fk such that 

(CI) ioY i = 1, . . . , k, Fi = C(Y\Ai) E T almost surely, 
(C2) for i ^ j , Fi ^ Fj with positive probability, 

(C3) C{Y\ Fi, . . . Ffc) = G(Fi, ...,Fk) almost surely 

It is important to note that condition (CI) has two requirements, the first being that 
Fi = C(Y\Ai) is ideal, and the second that Fi E almost surely. Condition (C2) excludes 
trivial cases, while (C3) requests the aggregated cumulative distribution function to be ideal 
relative to the cx-algebra generated by the full set Fi, . . . , of the components. Note that 
the smaller the class J-', the stronger a statement about coherence. Conversely, the larger 
the class J-', the stronger a statement about lack of coherence. 

Conceptually, a coherent combination formula is compatible with fully informed and 
theoretically literate decision making, whereas an incoherent combination formula is not. In 
this light, coherence is an appealing property of a combination formula. Nevertheless, the 
applied relevance of the notion of coherence is limited, as probabilistic forecasts are hardly 
ever ideal in practice, which may invite, or even necessitate, the use of flexible classes of 
potentially incoherent combination formulas. Motivated by these applied perspectives, we 
now turn to a discussion of the important notions of calibration and dispersion. 

2.3 Calibration and dispersion 

If F denotes a fixed, non-random predictive cumulative distribution function for an obser- 
vation Y, the probability integral transform is the random variable Zp = F{Y). It is well 
known that if F is continuous and F ~ F then Zp is standard uniform (Rosenblatt 1952). 
If the more general, randomized version of the probability integral transform proposed by 
Brockwell (2007) is used, the uniformity result applies to arbitrary, not necessarily contin- 
uous, but still fixed, non-random cumulative distribution functions. In the prediction space 
setting, we need the following, further extension that allows for F to be a CDF-valued 
random quantity. 
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Definition 2.6. In the prediction space setting, tlie random variable 

Zp = lnny^yF{y) + V {F{Y) - lim,^yF(y)) 
is the probability integral transform of the CDF-valued random quantity F. 

In a nutshell, the probability integral transform is the value that the predictive cumu- 
lative distribution function attains at the observation, with suitable adaptions at points of 
discontinuity. The probability integral transform takes values in the unit interval, and so the 
possible values of its variance are constrained to the closed interval [0, |]. A variance of ^ 
corresponds to a uniform distribution and continues to be the most desirable, as evidenced 
by Theorem 12.91 below. 

We are now ready to define and study notions of calibration and dispersion. In doing 
so, we use the terms CDF-valued random quantity and forecast interchangeably. 

Definition 2.7. In the prediction space setting, let F and G be CDF-valued random quan- 
tities with probability integral transforms Zp and Zq- 

(a) The forecast F ist marginally calibrated if EQ[F(y)] = Q{Y < y) for all ?/ G M. 

(b) The forecast F is probabilistically calibrated if its probability integral transform Zp is 
uniformly distributed on the unit interval. 

(c) The forecast F is overdispersed if vai^Zp) < ^, neutrally dispersed if var(Zi?) = ^, 
and underdispersed if \ax{Zp) > rj^. 

(d) The forecast F is at least as dispersed as the forecast G if var(Z^) < Yai{ZQ). It is 
more dispersed than G if var(Z^) < var(ZG'). 

(e) The forecast F is regular if the distribution of Zp is supported on the unit interval. 

In the defining equality Eq[F(?/)] = Q(F < y) for marginal calibration, the left-hand 
side depends on the law of the predictive distribution, whereas the right-hand side depends 
on the law of the observation. In parts (c) and (d) of Definition \2.1\ we define dispersion 
in terms of the variance of the probability integral transform, thus involving the joint law 
of the predictive distribution and the observation. In contrast, the spread of the predictive 
distribution itself is a measure of sharpness that does not consider the observation. 

Our current setting of prediction spaces differs from, but relates closely to, the approach 
of Gneiting et al. (2007), who studied notions of calibration from a prequential perspective. 
Specifically, if the CDF- valued random quantity F is probabilistically calibrated in the sense 
of Definition 12.71 and we sample from the joint law of F and F, the resulting sequence is 
probabilistically calibrated in the sense of Gneiting et al. (2007). An analogous statement 
applies to marginal calibration. 

Returning to the prediction space setting, the following result is immediate. 

Proposition 2.8. A probabilistically calibrated forecast is neutrally dispersed and regular. 
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The converse is not necessarily true, in that a forecast which is neutrally dispersed need 
not be calibrated nor regular. However, an ideal forecast is always calibrated. 

Theorem 2.9. A forecast that is ideal relative to a a-algehra ist both marginally calibrated 
and probabilistically calibrated. 

Proof. Suppose that F = C{Y\Ao) is ideal relative to the a-algebra so that F{y) = 
QO^ < y I v4o) almost surely for all ?/ G M. Then 

EQ[F(y)] = Eq[Q{Y < y\Ao)] = EqEq [I{Y < y)\Ao] = Q{Y < y), 

where I denotes an indicator function, thereby proving the statement about marginal cali- 
bration. Turning to probabilistic calibration, let Qo denote the marginal law of Y under Q, 
so that Zf = Qo((-oo,F)|A) + '^Qo({^}|A) and 

Q{Zp <z)=Eq Eq [I{Zf < ^) I A] = ^ 

for z G (0, 1), where the final equality uses the key result of Brockwell (2007). □ 

Example 2.10. We revisit the density forecasts in Example 12.41 and Table [1] The perfect 
forecast and the climatological forecast are ideal, and so by Theorem 12.91 they are both 
probabilistically calibrated and marginally calibrated. Arguments nearly identical to those 
in Gneiting et al. (2007) show that the unfocused forecaster is probabilistically calibrated but 
not marginally calibrated, and that the sign-biased forecaster is marginally calibrated but 
not probabilistically calibrated. Hence, there is no sub- a- algebra or information set relative 
to which the unfocused or the sign-biased forecaster is ideal. 

Dawid (1984), Diebold et al. (1998), Gneiting et al. (2007) and Czado et al. (2009), 
among others, have argued powerfully that probabilistic calibration is a critical requirement 
for probabilistic forecasts that take the form of predictive cumulative distribution functions, 
with Theorem 12.91 lending further support to this approach. Indeed, checks for the uni- 
formity of the probability integral transform have formed a cornerstone of density forecast 
evaluation. In practice, one observes a sample {{Fij, . . . , Fkj, yj) : j = 1, . . . , J} from the 
joint distribution of the probabilistic forecasts and the observation, and the uniformity of the 
probability integral transform is assessed empirically. The prevalent way of doing this is by 
plotting histograms of the probability integral transform values for the various forecasting 
methods, which show the corresponding frequency distribution over an evaluation or test set. 
U-shaped histograms correspond to underdispersed predictive distributions with prediction 
intervals that are too narrow on average, while hump or inverse U-shaped histograms indi- 
cate overdispersed predictive distributions. Formal tests of uniformity can also be employed; 
for a review, see Corradi and Swanson (2006). 

Example 2.11. Let Y = X + e, where X and e are independent, standard normal random 
variables, and consider the Gaussian predictive distribution F^- = Af{X,a^). A stochastic 
domination argument, the details of which we give in Appendix A, shows that F„ is under- 
dispersed if cr < 1, neutrally dispersed if a = 1 and overdispersed if a > 1. If a = 1 then 
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F(j is ideal and thus both marginally calibrated and probabilistically calibrated. A more 
detailed calculation, which is also given in Appendix A, shows that the probability integral 
transform Z^j = -Fo-(V) satisfies 

var(Z.) = 2j\{l- $(a($-i(^)))) dz-(^j\l- ^a{^-\z)))) dz^ . (2) 

In Figure [1] we plot var(Zo-) as a function of the predictive standard deviation, a. Figure 
[2] shows probability integral transform histograms for a Monte Carlo sample of size 10,000 
from the joint distribution of the observation Y and the forecasts F^, where a = |, 1 and 
|. The histograms are U-shaped, uniform, and inverse U-shaped, reflecting underdispersion, 
neutral dispersion and calibration, and over dispersion, respectively. 

Recall from Example 12.31 that in the case of a binary outcome Y, we identify a CDF- 
valued random quantity F{y) = pl{y > 0) + (1 — p)I{y > 1) with the probability forecast p 
for a success, that is, Y = 0. The extant literature, including Schervish (1989) and Ranjan 
and Gneiting (2010) and the references therein, calls p calibrated if 

Q{Y = 0\p) = p almost surely. (3) 

Here we refer to this property as conditional calibration. Perhaps surprisingly, our next result 
shows that if the outcome is binary, the notions of probabilistic calibration and conditional 
calibration are equivalent. For an illustrating example, see Appendix C. 

Theorem 2.12. Consider a prediction space {Q,A,Q) with a binary outcome Y, where 
y = corresponds to a success and Y = {] to a failure, and a CDF-valued random quantity 
F{y) = P^y > 0) + (1 — p)I{y > 1), which can be identified with the probability forecast p 
for a success. Then the following statements are equivalent: 

(i) The forecast F is probabilistically calibrated, that is, its probability integral transform 
Zp is uniformly distributed on the unit interval. 

(ii) The probability forecast p is conditionally calibrated, that is, Q{Y = 0\p) = p almost 
surely. 

(iii) The forecast F is ideal relative to the a-algebra generated by the probability forecast p. 

Proof. It is clear that (ii) and (iii) are equivalent, and by Theorem 12.91 the statement 
(iii) implies (i). To conclude the proof, we show that statement (i) implies (ii). To this 
end, suppose that the forecast F is probabilistically calibrated. By standard properties of 
conditional expectations, there exists a measurable function q : [0, 1] — )■ [0, 1] such that 
Q{Y = 0\p) = q{p) almost surely. Let H denote the marginal law of p under Q. If H has a 
point mass at or 1, it is readily seen that g(0) = or g(l) = 1, respectively. 
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Figure 1: The variance ([2]) of the probabihty integral transform Z„ = F„{Y) for the predictive 
distribution F„ in Example 12.111 as a function of the predictive standard deviation, a. The 
dashed horizontal line at indicates a neutrally dispersed forecast. 
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Figure 2: Probability integral transform histograms for the predictive distribution in 
Example 12.111 where c" = | (underdispersed) , a = 1 (neutrally dispersed and calibrated) 
and 0" = I (overdispersed) . 
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Table 3: Example of a probabilistically calibrated, but not auto-calibrated CDF- valued 
random quantity F for a ternary outcome Y . 

Q-probability F{x) Q(y = i|F) 

a; < < x < 1 1 < X < 2 x>2 i = {) i = l i = 2 

I I 

13 3 

4 8 8 



A version of the conditional density u{z\x) of the probability integral transform Zp given 
that p = X G [0, 1] satisfies u{z\x) = (1 — g(x))/(l — x) for z E [0,1 — x] and u{z\x) = q{x)/x 
for z E (1 — X, 1]. The marginal density u of Zp is standard uniform, so that 

u{z + 6)-u{z)= [ {u{z + 6\x) -u{z\x))dH{x) = [ '^^f' ~ \ dH{x) = 

J [0,1] J{l-z-5,l-z] X{1 — X) 

for Lebesgue- almost all z G (0, 1) and 5 G (0, 1 — y). Let < a < 6 < 1, and consider the 
signed measure defined by 

for Borel sets A C [a,b]. We have just shown that yu(A) = for all intervals {c,d] C [a,b], 
except possibly for c or d in a Lebesgue null set. Since the family of such intervals generates 
the Borel-o"- algebra, this is only possible if fi is the null measure, so that fi{A) = for 
all Borel sets A in [a,b]. In particular, fi{A) = ior A = {x E [a,b] : q{x) > x} and 
A = {x E [a,b] : q{x) < x}, which implies that q{x) = x almost surely with respect to the 
restriction of H to [a,b]. To summarize, we have shown that q{x) = x almost surely with 
respect to H, whence Q{Y = 0\p) = p almost surely with respect to Q, as desired. □ 

Theorem 12.121 draws a connection from the probability integral transform histogram to 
the reliability diagram or calibration curve, which is the key diagnostic tool for assessing the 
calibration of probability forecasts for a binary event (Dawid 1986; Murphy and Winkler 
1992; Ranjan and Gneiting 2010). A reliability diagram plots conditional event frequencies 
against binned forecast probabilities, with deviations from the diagonal indicating violations 
of the conditional calibration condition ([3]). For non-binary outcomes Y and the natural 
generalization of the conditional calibration criterion, namely the auto- calibration property 

C{Y\F) = F Q-almost surely 

introduced by Tsyplakov (2011) in a comment on Mitchell and Wallis (2010), the equiva- 
lence to probabilistic calibration fails, as demonstrated in Table [3] for a ternary outcome. 
In general, auto-calibration is a much stronger requirement than probabilistic calibration, 
and there is a small but colorful, growing strand of recent literature that addresses its ap- 
plied aspects for continuous outcomes (Hamill 2001; Mason et al. 2007; Held, Rufibach and 
Balabdaoui 2010; Brocker, Siegert and Kantz 2011; Mason et al. 2011). 
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2.4 Aggregation methods 

An aggregation method is a family 

g = {Ge:eee} 

of combination formulas of the form 

Ge:T' = Tx---xT^T, (Fi, . . . , Ffc) ^ Ge{F,, F^). (4) 

that share a common value of k and a common class J-" of fixed, non-random cumulative 
distribution functions. For example, if Q is the traditional linear pool, we can take J-" to be 
any convex class of cumulative distribution functions, and we may identify the index set G 
with the unit simplex in R'^. 

The extant literature studies individual combination formulas by examining whether or 
not they possess certain analytic characteristics, such as the strong setwise function and 
external Bayes properties (McConway 1981; Genest 1984a, 1984b; Genest and Zidek 1986; 
Genest, McConway and Schervish 1986; French and Rios Insua 2000). In contrast, we share 
the recent perspective of Hora (2010) and put the focus on calibration and dispersion. In 
particular, we study aggregation methods in terms of the behavior of the probability integral 
transform under the corresponding family Q = {Gq : 6 G 0} of combination formulas. As 
noted, the probability integral takes values in the unit interval, and so the possible values of 
its variance lie between and j. The value ^ corresponds to neutral dispersion and is the 
most desirable. 

Following French and Rios Insua (2000, p. 113) we say that the combination formula Gg 
is anonymous if 

G'6»(-F7r(i), • • • , -^7r(fc)) = Ge{Fi, . . . , Fk) 

for all Fi, . . . ,Fk G J-" and all permutations vr on elements. With this, we are ready to 
define notions of flexibility for aggregation methods. 

Definition 2.13. Consider a family Q = {Gg : 9 G 0} of combination formulas of the form 
(jl]) that share a common k > 2 and a common class J-" of fixed, non-random cumulative 
distribution functions. 

(a) The aggregation method Q is flexibly dispersive relative to the class J-' if for all Fq E 
and Fi, . . . , Ffc G J-" there exists a parameter value 6* G 6 such that if C{Y) = Fq then 
Gg{Fi, . . . , Ffc) is neutrally dispersed. 

(b) The aggregation method Q is exchangeably flexibly dispersive relative to the class J-" if 
for all Fq G J-" and Fi , . . . , G J-" there exists a parameter value 6* G 9 such that Gg 
is anonymous and if C{Y) = Fq then Gg{Fi, . . . , F^) is neutrally dispersed. 

The applied relevance of the definitions is appreciated as follows. Suppose that the 
aggregation method Q is flexibly dispersive relative to J-". Then, given any marginal law 
Fq G J-" for the observation Y and any collection Fi, . . . , F^. of probabilistic forecasts for Y, 
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we can find a combination formula G0 E Q such that the aggregated predictive distribution, 
namely G0{Fi, . . . , Fk), is neutrally dispersed. If Q is exchangeably flexibly dispersive, we 
can do so while treating Fi, . . . ,Fk exchangeably, which is a frequent requirement in the 
practice of the combination of expert judgements (Jouini and Clemen 1986). 

Note that a positive statement about flexible dispersivity is the stronger, the larger the 
class T. Conversely, a statement about the lack of flexible dispersivity is the stronger, the 
smaller the class J^. 

3 Linear and nonlinear aggregation methods 

In this section we investigate specific combination formulas and aggregation methods from 
the perspectives of coherence, calibration and dispersion. First, wc consider the traditional 
linear pool, then we move on to discuss non-linear ramifications, namely generalized linear 
pools, the spread-adjusted linear pool and the beta transformed linear pool. Note that 
the linearity of a combination formula for cumulative distribution functions is preserved in 
the corresponding formula for aggregating densities or event probabilities. In contrast, the 
functional form of a nonlinear combination formula changes if it is expressed in terms of 
densities or event probabilities. 

3.1 Linear pool 

We proceed to state and prove a simple but powerful result about linear combination formulas 
that generahzes earlier findings by Hora (2004) and Ranjan and Gneiting (2010). The gist 
of the statement is that dispersion tends to increase under linear aggregation. 

Theorem 3.1. In the prediction space setting, suppose that k > 2 and consider any linearly 
combined probabilistic forecast F = Y^'^^i WiFi with weights wi, . . . ,Wk that are strictly pos- 
itive and sum to 1. For i ^ j, suppose that Fj ^ Fj with positive probability. Then the 
following holds: 

(a) The linearly combined forecast F is at least as dispersed as the least dispersed of the 
components Fi, . . . ,Fk. 

(b) // the component forecasts Fi,. . . ,Fk are regular, then F is more dispersed than the 
least dispersed of the components. 

(c) // the components are neutrally dispersed and regular, then F is overdispersed. 

Proof. For i — 1, . . . , A;, let denote the probabihty integral transform of F^. The 
probability integral transform of F = Yli=i '^i^i is Z — Yli=i '^i^ii whence 



k k 



k k 



van 



WiWj cov{Zi, Zj) < Wi Wj 
i=i j=i 



max veniZi) 



max variZj) 



1=1 j=i 
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which demonstrates part (a). To prove part (b) suppose, for a contradiction, that Fi, . . . ,Fk 
are regular and var(Z) = maxi<i<jfc var(Zj). Then Zi and Zj are perfectly correlated for 
i,j = 1, . . . ,k, and we conclude that there exist constants aij > and bij G M such that 
Zi = ttijZj + bij almost surely. By the assumption of regularity, Zi and Zj are supported on 
the unit interval, whence aij = 1 and bij = 0. Therefore, Zi = Zj almost surely, contrary to 
the assumption that Fi ^ Fj with positive probability. Part (c) concerns the special case of 
part (b) in which var(Zj) = ^ for i = 1, . . . , fc. □ 

As noted. Theorem 13.11 yields various extant results as corollaries. For instance, Hora 
(2004) applied Fourier analytic tools to show that if two distinct density forecasts are proba- 
bilistically calibrated, then any nontrivial linear combination is uncalibrated, which is an im- 
mediate consequence of part (c). However, the statement of part (c) is considerably stronger, 
in that it substitutes the weaker condition of neutral dispersion and regularity for the as- 
sumption of probabilistic calibration, allows for any number /c > 2 of components, allows for 
cumulative distribution functions rather than the special case of densities, and exposes the 
direction of the deviation, in that the linearly combined forecast is overdispersed. Each of 
the four facets is useful in practice. For instance, there are real data situations where density 
forecasts are approximately neutrally dispersed and regular, but clearly not calibrated. Dis- 
crete and mixed discrete-continuous predictive cumulative distribution functions also occur 
frequently in practice, such as in quantitative precipitation forecasting (Sloughter et al. 2007) 
and for count data (Czado et al. 2009). Finally, the tendency to increase dispersion helps 
explain the success of linear pooling in applications, where the component distributions are 
frequently underdispersed. For a prominent example, see Table 10 of Hoeting et al. (1999). 

The next theorem nests extant results in the case of probability forecasts for a binary 
event, which were proved by Dawid et al. (1995) and Ranjan and Gneiting (2010). 

Theorem 3.2. If k > 2, any linear combination formula with strictly positive weights fails 
to be coherent relative to the class J-jr. 

Proof. Suppose, for a contradiction, that the combination formula G{Fi, . . . , Fk) = F = 
Yli=i '^i^i with strictly positive weights that sum to 1 is coherent relative to J-r. Then there 
exists a prediction space (f2,^, Q) with sub-a-algebras Ai,...,Ak C A and CDF-valued 
random quantities Fi,...,Fk such that properties (CI), (C2) and (C3) of Definition 12.51 
hold. By property (CI) and Theorem 12. 9[ the components Fi,. . . ,Fk are probabilistically 
calibrated, and by Proposition 12.81 they are neutrally dispersed and regular. Thus, part (c) 
of Theorem 13.11 applies and the linearly aggregated forecast F is overdispersed. However, 
by property (C3), Theorem 12.91 and Proposition 12.81 F is neutrally dispersed, for the desired 
contradiction. □ 

Thus far, we have considered individual linear combination formulas. The following 
result views the traditional linear pool as an aggregation method Q = {Ge : 6 G O}, where 
we may identfy the parameter space O = A^-i with the unit simplex in M'^. We state the 
theorem relative to the full class J-jr, even though it remains valid relative to much smaller 
classes. 
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Table 4: Specifics of tlie generalized linear combination formula in equation (jS]). The table 
states assumptions on the weights, Wi, . . . ,Wk, and instances of classes J-", such that the 
combination formula maps into J-". The conditions depend on the domain and the range 
of the link function, h, which we assume to be continuous and strictly monotone. 



Type Domain Range Weights Class J-" Example 



A 


[0,1] 


any 


Wi 


> 0; Y.i=i 


= 1 






h{x) 


= X 


B 


(0,1) 


(l,oo) 


Wi 


> 0; ELi Wi 


= 1 




or S+ 


h{x) 


= 1/x 


C 


(0,1) 


(-00,0) 


Wi 


> 0; ELi Wi 


> 


•^i 


or B+ 


h{x) 


= log X 


D 


(0,1) 


M 


Wi 


> 0; Y.i=i Wt 


> 




or B+ 


h{x) 





Theorem 3.3. If k > 2, the linear pool fails to be flexibly dispersive relative to the class 

Proof. In view of part (a) of Theorem 13.11 it suffices to find an Fq G J-r and distinct 
Fi, . . . , Ffc G J-jR, each of which is an overdispersed as a probabilistic forecast for an observa- 
tion Y with C{Y) = Fq. For example, we can take Fq to be standard normal and Fj to be 
normal with mean zero and variance i + 1 for i = 1, . . . , k. □ 

3.2 Generalized linear pools 

Dawid et al. (1995) introduced and studied generalized linear combination formulas for 
combining probability forecasts of a binary event. Here we apply the approach to cumulative 
distribution functions, where we obtain combination formulas of the form 

(fc \ k 

Y,Wih{F,{y))] or h{G{y)) = J^w. h{F,{y)), (5) 
i=l / 1=1 

where h is a continuous and strictly monotone link function. It is interesting to note the for- 
mal resemblance to Archimedean copulas (Genest and MacKay 1986; McNeil and Neslehova 
2009), as observed and investigated by Jouini and Clemen (1996). 

Table m shows conditions on the weights, wi, . . . ,Wk, along with instances of classes J-", 
so that the generalized linear combination formula ([5]) maps J-"^ into J-". In the first type, the 
link function is defined on the closed unit interval, and the combination formula operates on 
the full class J-k, with the traditional linear pool, for which h{x) = a; is the identity function, 
being the most prominent example. In the remaining types, the link function is defined on 
the open unit interval only, and we need to restrict attention to J^^ or B~^, with the harmonic 
pool and the geometric pool being key examples, occuring when h{x) = 1/x and h{x) = logx, 
respectively. While not being exhaustive, the listing in the table is comprehensive, in that 
most link functions can be adapted to fit one of the types considered. 

The following result applies to generalized linear combination formulas with nonnegative 
weights that sum to at most 1. 
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Theorem 3.4. Suppose that k > 2, and let I C M be an interval. Consider a generalized 
linear combination formula G of the form ^ with a continuous and strictly monotone link 
function, h, and weights Wi, . . . ,Wk that are strictly positive and sum to at most 1. 

(a) If the link function is defined on the closed unit interval, then G fails to be coherent 
relative to the class Ci. 

(b) If the link function is defined on the open unit interval and square-integrable, then G 
fails to be coherent relative to the class Cf. 

Proof. We proceed similarly to the proofs of Theorems 13.11 and 13.21 which correspond 
to the special case where h is the identity function and the weights sum to 1. As regards 
part (a) suppose, for a contradiction, that the generalized linear combination formula G 
is coherent relative to the class Cr. Consider the prediction space Q) with sub-cr- 

algebras Ai, . . . , Ak ^ A, CDF- valued random quantities Fi, . . . ,Fk G Cr, and observation 
Y such that conditions (CI), (C2) and (C3) of Definition ES] hold. Let X = h{G(Y)) and 
Xi = h{Fi{Y)) for i = 1,...,A;. By (CI), (C3), the continuity of G and h, 

and Theorem 12.91 the random variables X and Xi, . . . X^ are identical in distribution and 
bounded, and so they share a common finite value of the variance. Furthermore, condition 
(C2) implies that iii ^ j then Xi ^ Xj with positive probability, whence Xi and Xj cannot 
be linearly dependent. Consequently, 

k k k k /■ \ 

var(X) = WiWj cov(Xj, Xj) < Wi wj ( max var(Xj) | < var(X), 

i=l j=l i=l j=l ^ ^ 

for the desired contradiction. In part (b) we argue identically, noting that even though X and 
Xi, . . . ,Xk may now be unbounded, they still share a common finite value of the variance, 
by the assumption of square-integrability for the link function. □ 

Dawid et al. (1995) proved in various special cases that generalized linear combination 
formulas with nonnegative weights summing to 1 fail to be coherent relative to the class 
of the nondegenerate Bernoulli measures. Furthermore, Dawid et al. (1985, pp. 282-283) 
conjectured that such results hold in broad generality. While Theorem 13.41 does not apply 
to Bernoulli measures, as its proof depends on the continuity of Fi, . . . , and G, we share 
this belief. 

However, if we allow for individual weights that exceed 1, the situation may change, as 
exemplified now in the case of Bernoulli measures. 



Example 3.5. We revisit Example 12.31 where we identify the ideal predictive distributions 
Fi = C(Y\Ai) and F2 = C{Y\A2) with the corresponding conditional success probabilities 
Pi and P2, respectively. It is then readily seen that 

Q(F = I p„p2) = $((1 + a|y/'<^~\p^) + (1 + aiy/'<^-\p2)) . 
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Hence, the type D generalized linear combination formula 



with a normal quantile link function, $ ^, and weights Wi > 1 and W2 > 1 is coherent 
relative to the class of the nondegenerate Bernoulli measures, or any larger class. 

The defining equation ([5]) for a generalized linear combination formula implies that if 
the weights wi, . . . ,Wk are nonnegative and sum to at most 1 then 



In the previous section we applied this type of argument to show that the traditional linear 
pool fails to be flexibly dispersive. Similarly, we conjecture that generalized linear pools with 
nonnegative weights that are bounded above fail to be flexibly dispersive. 

It is important to note that the extant literature considers generalized linear combination 
formulas that operate on event probabilities or densities, rather than cumulative distribution 
functions. For example, Dawid et al. (1995) study geometric and harmonic linear pools for a 
binary outcome, while Bj0rnland et al. (2011) apply the geometric pool to density forecasts. 
Whether or not the resulting combination formulas and aggregation methods are coherent 
and flexibly dispersive, respectively, remains to be investigated. 

3.3 Spread-adjusted linear pool 

The aforementioned limitations of linear and generalized linear pools suggest that we consider 
more flexible, nonlinear aggregation methods. In this section, we focus on the class "D^, so 
that we may identify the cumulative distribution functions Fi, . . . ,Fk with the corresponding 
Lebesgue densities fi, . . . , f^. 

In the context of probabilistic weather forecasts and approximately neutrally dispersed 
Gaussian components /i, . . . , f^, Berrocal et al. (2007), Glahn et al. (2009) and Kleiber et 
al. (2011) observed empirically that linearly combined predictive distributions are overdis- 
persed, as confirmed by Theorem 13.11 In an ad hoc approach, they proposed a nonlinear 
aggregation method which we now generalize and refer to as the spread- adjusted linear pool 
(SLP), as follows. 

To describe this technique, it is convenient to write Fi{y) = F^{y — /ij) and fi{y) = 
fi{y ~ A^j)) where /Xj is the unique median of Fi G V^, for z = 1, . . . , fc. The SLP combined 
predictive distribution then has cumulative distribution function and Lebesgue density 



respectively, where Wi,...,Wk are nonnegative weights that sum to 1, and c is a strictly 
positive spread adjustment parameter. For neutrally dispersed or overdispersed components 





k 




i=l 



1=1 
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values of c < 1 are appropriate; for example, Table 2 of Berrocal et al. (2007) reports 
estimates ranging from 0.65 to 1.03. Under dispersed components may suggest values of 
c > 1, and the traditional linear pool arises when c = 1. 

We do not know whether or not there are coherent combination formulas of this type. 
However, the SLP method performs well in the aforementioned applications, and the follow- 
ing result serves to quantify its flexibility. 

Proposition 3.6. Suppose that C{Y) = Fq G T)^ and that Fi,. . . , -F^ G have medians 
f^i < •■■ < Aife- Let Zc = Gc{Y) denote the probability integral transform of the SLP 
aggregated predictive cumulative distribution function. Let vq = and pq = Fq{^i), let 
'"i = Y!j=i^j "'^d Pi = Fo(/ii+i) - Fo(/ii) for i = l,...,k - 1, and let Vk = I and pk = 
1 — Fo{fik)- Then as the spread adjustment parameter c > varies, the variance of Zc 
attains any positive value less than 

k / k 

^Pi[vi- ^PjVj J . (7) 

i=0 V j=0 / 

Proof. As c — )■ 0, the probability integral transform Z^ converges weakly to the discrete 
probability measure with mass po, ■ • ■ ^Pfc at wq, . . . , ffc, which has variance ([7]). As c — )■ cxd, it 
converges weakly to the Dirac measure in |. In view of var(Zc) being a continuous function 
of the spread-adjustment parameter c > 0, this proves the claim. □ 

Our next result views the spread-adjusted linear pool as an aggregation method with 
parameter space B = Afc_i x While the SLP approach is sufficiently rich in typical 
applications, where the individual predictive distributions are neutrally dispersed or under- 
dispersed, its flexibility is limited. 

Theorem 3.7. The spread- adjusted linear pool fails to be flexibly dispersive relative to the 
class V^. 

Proof. Let Fq be standard normal, and for i = 1, . . . , /c let be normal with mean 
m-\- — and variance 1. As m — )• oo, the probability integral transform of the SLP combined 
forecast Gc attains values less than ^ with probability tending to one, irrespectively of the 
values of the SLP weights and the spread adjustment parameter c. Thus, if 

m is sufficiently large, the variance of the PIT remains below the critical value of ^ that 
corresponds to neutral dispersion. □ 

The SLP combination formula ^ can be generalized to allow for distinct spread ad- 
justment parameters for the individual components. However, such an extension does not 
admit neutral dispersion either, and tends not to be beneficial in applications, unless the 
component densities have drastically varying degrees of dispersion. The assumption of a 
common spread adjustment parameter yields a more parsimonious model and stabilizes the 
estimation. 
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3.4 Beta-transformed linear pool 

The beta transformed linear pool (BLP) composites the traditional hnear pool with a beta 
transform. Introduced by Ranjan and Gneiting (2010) in the context of probability forecasts 
for a binary event, it generalizes readily to the full class J-jr of the cumulative distribution 
functions on M. Specifically, the BLP combination formula maps Fi, . . . ,Fk G J-jr to Ga,i3 G 
J^K, where 

G^M = B^,ts(^^w^Fiiy)j (8) 

for y G M. Here, Wi, . . . ,Wk are nonnegative weights that sum to 1, and Ba^jj denotes the 
cumulative distribution function of the beta density with parameters a > and /3 > 0. 
In contrast to the spread-adjusted linear pool, the value of the BLP aggregated predictive 
cumulative distribution function Ga,i3 at y E M. depends on Fi,...,Fk only through the 
values Fi{y), . . . , Fk{y), in a locality characteristic that resembles the strong setwise function 
property of McConway (1981). If Fi has Lebesgue density fi for i = 1, k, the aggregated 
cumulative distribution function Ga,i3 is absolutely continuous with Lebesgue density 



ga,(3iy) = ^Wifiiy) ba,l3[ ^WiFi{y) 



J=l J \i=\ 

where fe^,/? denotes the beta density with parameters a > and /3 > 0. This nests the 
traditional linear pool that arises when a = /3 = 1. 

The following result concerns the flexibility of the BLP combination formula (|8]) when 
the cumulative distribution functions Fq G Cr and Fi, . . . ,Fk G Cm are continuous and the 
weights Wi, . . . ,Wk > are fixed, while the transformation parameters vary. 

Proposition 3.8. Let Y have distribution Fq G Cr and suppose that Fi,...,Fk G Cr are 

such that 

supp(Fi) U ■ ■ ■ U supp(Ffc) = supp(Fo). (9) 

Let Za^i3 = Ga,i3{Y) denote the probability integral transform of the BLP aggregated predictice 
cumulative distribution function, where the weights are fixed at strictly positive values that 
sum to 1. Then as the transformation parameters a > and /3 > vary, the variance of 
Za^p attains any value in the open interval (0, |). 

Proof. The variance of Z^^j depends continuously on the transformation parameters 
a > and /3 > 0, with Zq, q, converging weakly to the Dirac measure in ^ as a — )■ oo, 
so that var(Zcj,o) — > as a — )• oo. If we can demonstrate the existence of a sequence 
(a, /3(a)) — )■ (0,0) such that Gq,/3(o)(?/o) = \i where is any median of Fq, the proof is 
complete, as the corresponding probability integral transform Za^p{a) converges weakly to 
the Bernoulli measure with success probability |, so that var(Za,/3(a)) — ?■ ^ as a — ?■ 0. 

We thus strive to find a sequence (a, /3(a)) — )■ (0, 0) such that Ga,i3{a){yo) = -Ba,^(a)(uo) = 
i, where Uq = Yli=i'^i-^iiyo) ^ (0' 1) by the support condition First we show that for 
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every a > there exists a unique f3{a) > such that -Ba, /3(a) (mo) = |; then we prove that 
/3{a) — > as a — )■ 0. As regards the first claim, three cases are to be distinguished. If mq < | 
then Ba^aiuo) < \ and Ba^piuo) — )• 1 as /3 — )■ oo, and continuity and monotonicity with 
respect to (3 imply the existence of a unique /3(a) > a such that -Ba,;3(a)('Uo) = \- If mo = | 
the choice /3(a) = a is unique. If mq > | then 5^,0(^0) > \ and Ba^piuo) — monotonically 
as /3 — )■ 0, and thus there exists a unique /3(a) < a such that -Ba,/3(a)(Mo) = \- To prove the 
second claim, suppose that /3(a) > /3o > for a sequence a — )■ 0. Then as a — )■ the beta 
distribution with parameters a and /3(a) has mean a/(a + /3(a)) — )■ 0, whereas its median 
Mo remains fixed and strictly positive, for the desired contradiction. □ 

The next result views the beta-transformed linear pool as an aggregation method with 
parameter space O = ^k-i x 

Theorem 3.9. The beta transformed linear pool is exchangeably flexibly dispersive relative 
to the class , for every interval / CM. 

Proof. If Fq G Ci and Fi, . . . , G , the support condition ^ is satisfied. We may 
thus apply Theorem 13.81 with weights Wi = ■ ■ ■ = Wk = \- □ 

As hinted at in Section 12. 4[ the spread-adjusted and beta transformed linear pools can 
be applied in the case = 1 of a single probabilistic forecast to provide calibration and dis- 
persion adjustments. If the original predictive distribution is symmetric about its center, the 
symmetry is retained by the spead-adjusted linear pool, and broken by the beta transformed 
linear pool, except when a = /3. 

In practice, the BLP weights Wi, . . . ,Wk and transformation parameters a,/3 > are 
estimated from training data, say {{Fij, . . . , F^j, yj) : j = 1, . . . , J}. If the predictive cumu- 
lative distribution functions F^, . . . ,Fkj are absolutely continuous with Lebesgue densities 
fij, ■ ■ ■ , fkj for j = 1, . . . , J, the aggregated predictive distributions are also absolutely con- 
tinuous, and our preferred estimation technique is to maximize the mean (or sum) of the 
logarithmic score (Matheson and Winkler 1976; Gneiting and Raftery 2007) over the training 
data, namely 

J 

/3) = J2^og{g^Ay,)) (10) 

J / k \ ^ ( ( ^ 

= XI I Wifijiyj) J + X log I I X ^i^ij^y^ 

j=l \i=l J j=l V \i=l 

= X ( - 1) ( E ^iP^Ayj)] + (/3 - 1) log (1 - E ^^^^^(^i 
j=i \ \i=i / \ i=i 

+ ^^^^[^'^ifij(yj)] " -''logB(«,/3), 
j=l \i=l J 
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where B denotes the classical beta function. The logarithmic score is simply the logarithm 
of the value that the density forecast attains at the realizing observation. It is positively 
oriented, that is, the higher the score, the better, and it is proper, in the sense that truth 
telling is an expectation maximizing strategy. Alternatively, the corresponding estimates can 
be viewed as maximum likelihood estimates under the assumption of independence between 
the training cases. 

The optimization can be carried out numerically using the method of scoring, for which 
we give details in Appendix B. Approximate standard errors for the estimates can be obtained 
in the usual way, by evaluating and inverting the Hessian matrix for the mean logarithmic 
score or log likelihood function. However, the estimates of the weights wi, . . . ,Wk need to be 
nonnegative. Thus, if unconstrained optimization results in negative weights, we turn to the 
active barrier algorithm implemented in the constrained optimization routine constrOptim 
in R (R Development Core Team 2011). Similarly, linear, generalized linear and spread- 
adjusted linear combination formulas can be fitted by maximizing the mean logarithmic 
score over training data. For reasons of simplicity and tradition, we frequently refer to the 
resulting estimates as maximum likelihood estimates. 

4 Simulation and data examples 

We now illustrate and complement our theoretical results in simulation and data examples 
on density forecasts. This corresponds to the prediction space setting, where the CDF- 
valued random quantities Fi, . . . , are absolutely continuous almost surely, and thus can 
be identified with random Lebesgue densities /i,...,/^. Throughout the section, we fit 
combination formulas by maximizing the mean logarithmic score over training data, in the 
ways described above and in Appendix B. To lighten the notation, we use the acronyms 
PIT, TLP, SLP and BLP to refer to the probability integral transform and the traditional, 
spread-adjusted and beta transformed linear pool, respectively. 

The recent work of Ranjan and Gneiting (2010) and Clements and Harvey (2011) contains 
a wealth of simulation and data examples on the combination of probability forecasts for a 
binary event. In the concluding Section [5] we summarize these experiences and relate them 
to the findings in the case studies hereinafter. 

4.1 Simulation example 

In this simulation example, the data generating process for the observation, Y, is the regres- 
sion model 

Y = Xo + aiXi + 02X2 + 03X3 + e, (11) 

where 01,02 and 03 are real constants, and Xq, Xi, X2, X^, and e are independent, standard 
normal random variables. The individual predictive distributions rest on partial knowledge 
of the data generating process, in that density forecast /i has access to the covariates Xq 
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Tabic 5: Maximum likelihood estimates with approximate standard errors (in brackets) for 
the parameters of the combined density forecasts in the simulation example. 







W2 


W3 c a /3 


TLP 

SLP 
BLP 


0.212 (0.083) 
0.257 (0.060) 
0.256 (0.057) 


0.254 (0.084) 
0.283 (0.061) 
0.293 (0.057) 


0.534 (0.080) _ _ _ 
0.460 (0.059) 0.783 (0.030) — 
0.451 (0.054) — 1.492 (0.062) 1.440 (0.059) 



Table 6: Variance of the PIT (dispersion) and root mean variance of the density forecast 
(sharpness) in the simulation example, for the test set. A value of or about 0.083 for the 
variance of the PIT indicates neutral dispersion. 
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1.79 


/2 


0.086 


1.79 


fs 
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1.73 


TLP 


0.066 


1.94 


SLP 
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1.62 
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0.084 


1.57 
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Figure 3: PIT histograms for the individual and combined density forecasts in the simulation 
example, for the test set. 
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Table 7: Mean logarithmic score for the individual and combined density forecasts in the 
simulation example, for the training set and the test set. 





Training 


Test 


/l 


-2.025 


-2.018 


/2 


-2.017 


-2.022 


/3 


-1.956 


-1.992 


TLP 


-1.907 


-1.922 


SLP 


-1.871 


-1.892 


BLP 


-1.865 


-1.886 



and Xi, but not to X2 or X3, and similarly for /2 and /a. Thus, we seek to combine the 
density forecasts 

/i = A^(Xo + aiXi, l + al + al), 

/2 = Ar(Xo + 02X2, 1 + a? + a^) and /g = A/'(Xo + 03X3, 1 + a? + a^), 

where Xq stands for shared, public information, while Xi, X2 and X3 represent proprietary 
information sets. The density forecasts represent the true conditional distributions under 
the regression model ffTTl) . given the corresponding partial information, as represented by 
the (T-algebras Ai = (t(Xo,Xi), A2 = cr(Xo,X2) and ^3 = (t(Xo,X3), respectively. Hence, 
the forecasts are ideal in the sense of Definition \2.2\ and by Theorem 12.91 they are both 
probabilistically calibrated and marginally calibrated. 

We estimate the TLP, SLP and BLP combination formulas on a simple random sample 
{{fij, f2j, fsj, Yj) '■ j = ^, ■ ■ ■ , J} of size J = 500 from the joint distribution of the forecasts 
and the observation, and evaluate on an independent test sample of the same size. The 
regression coefficients in the data generating model ( fTTi) are taken to be oi = 02 = 1 and 
03 = 1.1, so that is a more concentrated, sharper density forecast than /i and f2- 

Table [5] shows maximum likelihood estimates, along with approximate standard errors, 
for TLP, SLP and BLP combination formulas. For all three methods, the weight estimate is 
highest for f^, whereas the estimates for /i and /2 are smaller and not significantly different 
from each other. The SLP spread adjustment parameter c is estimated at 0.78, and the BLP 
transformation parameters a and /3 at 1.49 and 1.44, respectively. 

The PIT histograms for the various types of density forecasts over the test set are dis- 
played in Figure [3l with complementary results shown in Table [6l In addition to the variance 
of the PIT, which is our standard measure of dispersion, the table quantifies sharpness in 
terms of the root mean variance (RMV), that is, the square root of the average of the vari- 
ance of the predictive density over the evaluation set. The component forecasts /i, /2 and 
/3 are probabilistically calibrated and thus show uniform empirical PIT histograms, up to 
sample fiuctuations. As mandated by Theorem 13. 11 the linearly combined TLP density fore- 
cast is overdispersed and lacks sharpness. The SLP and BLP agrregated density forecasts 
show nearly uniform PIT histograms; they are approximately neutrally dispersed and much 
sharper than their competitors. 
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Table [7] shows the mean logarithmic score for the various types of density forecasts. The 
best individual density forecast is /s, because it is sharper than /i and f2- The hnearly 
combined density forecast outperforms the individual density forecasts, even though it is 
overdispersed. The nonlinearly aggregated SLP and BLP density forecasts show higher 
scores than any of the individual or linearly combined forecasts, both for the training data, 
where this is trivially true, as the nonlinear methods nest the traditional linear pool, and for 
the test data, where such cannot be guaranteed. 

4.2 Density forecasts for daily maximum temperature at Seattle- 
Tacoma Airport 

With estimates of some one-third of the economy, as well as much of human activity in 
general, being weather sensitive (Button 2002), there is a critical need for calibrated and 
sharp probabilistic weather forecasts, to allow for optimal decision making under inherent 
environmental uncertainty. 

In practice, probabilistic weather forecasts rely on ensemble prediction systems. An 
ensemble system comprises multiple runs of a numerical weather prediction model, with the 
runs differing in the initial conditions and/or the details of the mathematical representation 
of the atmosphere (Palmer 2002; Gneiting and Raftery 2005). Here we consider two-days 
ahead forecasts of daily maximum temperature at Seattle- Tacoma Airport, based on the 
University of Washington Mesoscale Ensemble (Eckel and Mass 2005), which employs a 
regional numerical weather prediction model over the Pacific Northwest, with initial and 
lateral boundary conditions supplied by eight distinct weather centers. A brief description 
of the ensemble members is given in Table |8l 

Our training period ranges from January 1, 2006 to August 12, 2007, with a few days 
missing in the data record, for a total of 500 training cases. The test period extends from 
August 13, 2007 to June 30, 2009, for a total of 559 cases. 

Each ensemble member is a point forecast, which can be viewed as the most extreme 
form of an underdispersed density forecast. To address the underdispersion and obtain 
approximately neutrally dispersed components, we use the maximum likelihood method on 
the training data to fit, for each ensemble member i = 1,...,8 individually, a Gaussian 
predictive density of the form 

fi = M{ai + biXij,ai). 

Here Xij is the point forecast from the zth ensemble member on day j, ai and hi are member 
specific linear bias correction parameters, and ai is the member specific predictive standard 
deviation. From Table [HI we see that the estimates for (Xi, . . . , ag range from 1.958 to 2.214. 

Next we combine the eight individual density forecasts. Table [10] shows maximum like- 
lihood estimates for TLP, SLP and BLP combination formulas. For all three methods, the 
GFS member, /i, obtains the highest weight and the ETA member, /s, the lowest weight. 
This can readily be explained, in that both members have a common institutional origin, 
and thus are highly correlated, whence the more competitive GFS member subsumes the 
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Table 8: Composition of the eight-member University of Washington Mesoscale Ensemble 
(Eckel and Mass 2005), with member acronyms and organizational sources for initial and 
lateral boundary conditions. The United States National Centers for Environmental Pre- 
diction supply two distinct sets of initial and lateral boundary conditions, namely, from its 
Global Forecast System (GFS) and Limited- Area Mesoscale Model (ETA). 



Index 


Acronym 


Source of Initial and Lateral Boundary Conditions 


1 


GFS 


National Centers for Environmental Prediction 


2 


CMCG 


Canadian Meteorological Centre 


3 


ETA 


National Centers for Environmental Prediction 


4 


GASP 


Australian Bureau of Meteorology 


5 


JMA 


Japanese Meteorological Agency 


6 


NGPS 


Fleet Numerical Meteorology and Oceanography Center 


7 


TCWB 


Taiwan Central Weather Bureau 


8 


UKMO 


United Kingdom Met Office 



Table 9: Maximum hkelihood estimates for the predictive standard deviation, Ui, for the 
individual, member specific density forecasts in the temperature example. 





cr2 


CT3 


0-4 


0-5 


CT6 


(77 


eg 


1.966 


2.051 


2.119 


2.214 


1.958 


2.055 


2.084 


1.995 



Table 10: Maximum likelihood estimates for the parameters of the combined density forecasts 
in the temperature example, including the Bayesian model averaging (BMA) approach of 
Raftery et al. (2005). 







W2 


W3 


W4 


W5 


We 


wr 


Ws 


c 


a 


/3 


a 


TLP 


0.394 


0.005 


0.000 


0.000 


0.317 


0.030 


0.144 


0.109 










SLP 


0.304 


0.080 


0.000 


0.085 


0.216 


0.051 


0.172 


0.090 


0.768 








BLP 


0.295 


0.079 


0.000 


0.083 


0.230 


0.062 


0.173 


0.076 




1.467 


1.467 




BMA 


0.305 


0.075 


0.000 


0.081 


0.216 


0.056 


0.170 


0.098 








1.566 



Table 11: Variance of the PIT (dispersion) and root mean variance of the density forecast 
(sharpness) in the temperature example, for the test period. A value of ^ or about 0.083 
for the variance of the PIT indicates neutral dispersion. 
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Temperature in Degrees Celsius 

Figure 4: Two-day ahead density forecasts for the maximum temperature at Seattle- Tacoma 
Airport on June 28, 2008. The vertical line is at the verifying maximum, at 32.8 degrees 
Celsius or 91 degrees Fahrenheit. 

weight of the ETA member. The SLP spread adjustment parameter is estimated at 0.768, 
and the BLP transformation parameters both at 1.467. 

Figure m illustrates the various density forecasts for June 28, 2008, an unusually hot day 
at Seattle- Tacoma Airport with a verifying maximum temperature of 32.8 degrees Celsius 
or 91 degrees Fahrenheit. The member specific individual density forecasts are shown by 
the dotted lines, and the linearly combined TLP forecast by the dash-dotted line. The 
nonlinearly aggregated SLP and BLP density forecasts, which are shown by the solid and 
dashed line, respectively, are sharper than the TLP density. 

PIT histograms for the test period are shown in Figure [5l along with summary measures 
of dispersion and sharpness in Table [TTl The individual, member specific density forecasts 
tend to be a bit overdispersed. The linearly aggregated TLP density forecast is much more 
severely overdispersed, as reflected by an inverse U-shaped and skewed PIT histogram. Of 
course, the over dispersion is not surprising, as it is a direct consequence of Theorem 13.11 
The SLP and BLP aggregated density forecasts show somewhat rough and skewed, yet more 
nearly uniform PIT histograms. 

These results are corroborated by Table [T2|, which shows the mean logarithmic score 
for the various types of density forecasts, both for the training period and the test period. 
The linearly combined TLP forecast shows a higher score than any of the individual density 
forecasts, which attests to the benefits of aggregation. Nevertheless, the linearly combined 
density forecast is suboptimal, because it is overdispersed and lacks sharpness, and thus it 
is outperformed by the nonlinearly aggregated SLP and BLP density forecasts. 

Finally, we compare to the Bayesian model averaging (BMA; Raftery et al. 2005) tech- 
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Figure 5: PIT histograms for the individual and combined density forecasts in the temper- 
ature example, for the test period. 



28 



Table 12: Mean logarithmic score for the individual and combined density forecasts in the 
temperature example, for the training period and the test period. 



Training 



Test 



/i 

h 
h 
h 
h 
h 

fr 
/s 



-2.091 
-2.134 
-2.167 
-2.211 
-2.088 
-2.136 
-2.150 
-2.107 
-2.027 
-1.990 
-1.988 
-1.992 



2.088 
2.071 
2.093 
2.172 
2.043 
2.143 
2.131 
2.041 
2.010 
1.961 
1.960 
1.963 



TLP 



SLP 



BLP 



BMA 



nique, which is a state of the art approach to generating density forecasts from forecast 
ensembles. The BMA density forecast for day j is of the form 

8 



with BMA weights, wi, . . . ,ws, that are nonnegative and sum to 1, member specific bias 
parameters a, and bi for i = 1, . . . , 8, and a common variance parameter, o"^. In view of 
our individual density forecasts being Gaussian, the TLP and BMA densities are of the 
same functional form. However, there is a conceptual difference, in that the TLP weights 
are fitted conditionally on the individual density forecasts. Thus, a two-stage procedure is 
used, in which the member specific component densities are estimated first, and only then 
the weights, with the components held fixed. In contrast, the BMA method estimates the 
weights, wi, . . . , ws, and the common spread parameter, a, for the component forecasts in the 
Gaussian mixture model f|T2l) simultaneously. While the BMA method can be employed with 
member specific spread parameters, the assumption of a common spread parameter stabilizes 
the estimation algorithm and does not appreciably deteriorate the predictive performance 
(Raftery et al. 2005). 

Table [10] shows maximum likelihood estimates for the BMA parameters, obtained with 
the R package ensembleBMA (Fraley et al. 2011). The BMA weights echo the SLP weights. 
The BMA spread parameter a is estimated at 1.566 and differs from the predictive standard 
deviations for the member specific density forecasts in Tabled by factors ranging from 0.707 
to 0.800, much in line with our estimate of 0.768 for the SLP spread adjustment parameter, 
c. Thus, the SLP and BMA density forecasts are very much alike, which is confirmed by the 
PIT histograms in Figure O the summary measures in Table [TT] and the logarithmic scores in 
Table [121 In Figured] the graphs for the SLP and BMA density forecasts are nearly identical 
and lie essentially on top of each other, and so we refrain from plotting the BMA density. 




(12) 



i=l 
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4.3 Density forecasts for S&P 500 returns 

In this final data example, we follow Diebold et al. (1998) in considering S&P 500 log returns 
for the period of July 3, 1962 to December 29, 1995. The data record through December 1978 
is used as training set, for a total of 4,133 training cases. All estimates reported are maximum 
likelihood fits on the training period obtained with the R package fGarch (Wuertz and 
Rmetrics Core Team 2007). The balance of the record is used as test period, for a total of 
4,298 one-day ahead density forecasts. 

The first component forecast, /i, is based on a generalized autoregressive conditional 
heteroscedasticity (GARCH; BoUerslev 1986) specification for the variance structure. With rj 
denoting the log return on day t, our GARCH(1,1) model assumes that = atet, where et is 
Student-t distributed with u degrees of freedom and variance 1, while at evolves dynamically 
as 

2 I 2 , n 2 

The maximum likelihood estimates for the GARCH parameters are u = 0.000, a = 0.089, 
13 = 0.903 and = 9.25. 

The second component forecast, /2, is based on a standard moving average (MA) model 
for the mean dynamics, which assumes that = Zt + 6Zt-i, where {Zt} is a Gaussian white 
noise process with mean zero and variance o"^. The maximum likelihood estimates for the 
MA parameters are 6 = 0.252 and a = 0.00736. 

Our goal now is to combine the density forecasts /i and f2- Table [T3l shows maximum 
likelihood estimates for TLP, SLP and BLP combination formulas. For all three methods, 
the conditionally heteroscedastic density forecast /i obtains a much higher weight than the 
simplistic density forecast f2- The SLP spread adjustment parameter is estimated at 0.940, 
and the BLP transformation parameters a and /3 at 1.100 and 1.081. This suggests that 
the overdispersion of the TLP density forecast is quite mild, which is confirmed by the PIT 
histogram in Figure [6] and the summary measures in Table [T41 

Table [TS] shows the mean logarithmic score for the various types of probabilistic forecasts. 
The TLP density forecast performs slightly better than the individual component f\, with 
a score that is very slightly lower than for the nonlinearly aggregated SLP and BLP density 
forecasts, both for the training and the test period. As observed by Geweke and Amisano 
(2011), there is little reward for using more elaborate, less parsimonious aggregation methods 
for density forecasts of S&P 500 returnsH 

Finally, we consider the predictive performance of a more comprehensive predictive 
model, which addresses both the first and the second order dynamics, in that rt = fit + et 
where {fit} and {et} are MA(1) and Student-t GARCII(1,1) processes, respectively. The 
maximum likelihood estimates in this mixed specification are 6 = 0.269 and a = 0.00736 for 
the MA parameters, and u = 0.000, a = 0.098, /3 = 0.892 and z/ = 8.284 for the GARCH 
parameters. The resulting density forecast can be thought of as combining information sets 

■^The logarithmic scores reported by Geweke and Amisano (2011) are summed, rather than averaged, and 
apply to percent log returns, rather than log returns. Adjusted for these differences, they are comparable to 
the scores in Table [151 
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Table 13: Maximum likelihood estimates of the parameters for the combined density forecasts 
in the S&P 500 example. 
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Figure 6: PIT histograms for the combined density forecasts in the S&P 500 example, for 
the test period. 



with respect to the first and second order dynamics, as opposed to combining the corre- 
sponding component forecasts /i and f2- It outperforms the other types of density forecasts 
and achieves a mean logarithmic score of 3.638 for the training period and 3.473 for the test 
period. 



5 Discussion 

We have studied methods for combining predictive distributions. Prom a theoretical per- 
spective, our approach deviates from previous work in major ways. Technically, we operate 
in terms of prediction spaces and cumulative distribution functions, which allows for a uni- 
fied treatment of all real-valued predictands including, for example, density forecasts for 
continuous variables and probability forecasts of a binary event. Conceptually, our work 
is motivated by applications in probabilistic forecasting, and thus we assess combination 
formulas and aggregation methods in terms of coherency, calibration, and dispersion. 

While it is not difficult to construct coherent combination formulas, the formulas typically 
used in practice tend to be incoherent. Our analytical results about the linear pool extend 
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Table 14: Variance of the PIT (dispersion) and root mean variance of the density forecast 
(sharpness) in the S&P 500 example, for the test period. A value of or about 0.083 for 
the variance of the PIT indicates neutral dispersion. 





var(PIT) 


RMV 




/l 


0.081 


9.23 X 10" 


-3 


/2 


0.078 


7.36 X 10- 


-3 


TLP 


0.079 


8.89 X 10- 


-3 


SLP 


0.083 


8.22 X 10" 


-3 


BLP 


0.084 


8.22 X 10" 


-3 



Table 15: Mean logarithmic score for the individual and combined density forecasts in the 
S&P 500 example, for the training period and the test period. 





Training 


Test 


/l 


3.606 


3.458 


/2 


3.492 


3.247 


TLP 


3.612 


3.469 


SLP 


3.614 


3.470 


BLP 


3.614 


3.470 



and unify extant work by Dawid et al. (1995), Hora (2004) and Ranjan and Gneiting (2010), 
and show that any linear combination formula with strictly positive coefficients fails to be 
coherent. In this sense, combined evidence is nonlinear. An interesting open problem is 
whether or not parsimonious ramifications, such as spread- adjusted or beta transformed 
linear combination formulas, are coherent. 

That said, the applied relevance of the notion of coherency is limited, as predictive dis- 
tributions or expert opinions issued in practice are hardly ever ideal. In typical practice, 
underdispersed or approximately neutrally dispersed predictive distributions are to be ag- 
gregated. In the case of underdispersed components, the tendency of linear combination 
formulas to increase dispersion can be beneficial, and helps explain the success of linear 
pooling in applications (Madigan and Raftery 1994). However, if the components are neu- 
trally dispersed, the failure of the traditional linear pool to be flexibly dispersive is a serious 
limitation. Berrocal et al. (2007), Glahn et al. (2009) and Kleiber et al. (2011) observed 
this empirically in the context of probabilistic weather forecasts, and proposed a special 
case of the spread-adjusted linear pool as an ad hoc remedy. Our theoretical results docu- 
ment the increased flexibility of the spread-adjusted linear pool, and demonstrate that the 
beta-transformed linear pool is exchangeably flexibly dispersive. 

Not surprisingly, the parsimonity principle and the bias- variance tradeoff apply in the 
practice of the combination of predictive distributions. Thus, in data poor settings, where 
training data are scarce, the parsimonious traditional linear pool might be the method 
of choice, despite its theoretical shortcomings, as demonstrated persuasively in the recent 
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simulation study of Clements and Harvey (2011). In data rich settings, where predictive 
models can reliably be estimated, linear aggregation tends to be suboptimal. Hence, we have 
studied parsimonious nonlinear alternatives, including the spread-adjusted linear pool (SLP) 
and the beta transformed linear pool (BLP). Further options include consensus methods 
(Winkler 1968) and nonparametric approaches, such as isotonic recursive partitioning (Luss, 
Rosset and Shahar 2011). As Winkler (1986) noted, "different combining rules are suitable 
for different situations" . 

The SLP and BLP approaches can also be used to provide calibration and dispersion 
corrections to a single predictive distribution, similar to the methods described by Cox 
(1958), Piatt (1999), Zadrozny and Elkan (2002) and Primo et al. (2009) in the context of 
probability forecasts for a binary event. An interesting question then is whether dispersion 
adjustments ought to be applied to the individual components prior to the aggregation. In 
situations in which the components show substantially differing degrees of dispersion, or 
are uniformly under- or overdispersed, we indeed see potential benefits in doing this, with 
(here unreported) simulation experiments providing partial support to this view. In our 
temperature example, the components derive from point forecasts, which is the most extreme 
form of underdispersion, and prior to aggregating the components we apply a simple Gaussian 
technique, which obtains approximately neutrally dispersed individual density forecasts. 

In addition to their relevance in probabilistic forecasting, our findings bear on the related 
problem of the fusion of expert opinions that are expressed in the form of probability dis- 
tributions. Ha-Duong (2008) reviews methods for doing this, and applies them to combine 
expert opinions about the climate sensitivity constant, which is a key quantity in the study 
of the greenhouse effect. Our analytic results imply that if each individual expert is neu- 
trally dispersed, linear aggregation results in combined assessments that are underconfident 
and show an unduly wide range of uncertainty, when in fact a sharper assessment could 
be made. In situations of this type, the parsimonious SLP technique with its single, easily 
interpretable spread-adjustment parameter, c, might be preferable to the BLP, particularly 
when the individual components are symmetric and unimodal. Our theoretical results show 
that neutrally dispersed components require values of c < 1, with more specific recommen- 
dations depending on the subject matter and situation at hand. The case c = 1 corresponds 
to the traditional linear pool and can be a useful choice in situations in which overconfident 
experts make underdispersed judgements. 



Appendix A: Details for Example 12.11 



Let Zf, = Fcr{Y) denote the probability integral transform of the CDF- valued random quan- 
tity F^. Then the random variable Z^- has expectation | and its cumulative distribution 
function is H^{z) = $(cr $"^(2;)). In particular, Zi is uniformly distributed. If ex < 1 then 
— i| is stochastically larger than \Zi — ^\ and therefore 

var(Z,) = E{Z^ - E[Z^]f = E\Z^ - > E|Zi - = 1. 
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An analogous argument applies when a > 1. To prove the variance formula ([2]), we use the 
fact that var(Zo-) = — (E[Zo-])^ and invoke the well-known expectation equality E,[Z^'] = 
r Jg°° — H[z)) dz for a nonnegative random variable Z with cumulative distribution 

function H, where r > 0. 



Appendix B: Method of scoring 

Here we give details for the method of scoring (see, for example, Ferguson 1996) for nu- 
merically maximizing the mean logarithmic score or log likelihood function ( ITOl) of the BLP 
model as a function of the nonnegative weights Wi, . . . ,Wk that sum to 1, and transforma- 
tion parameters a, /3 > 0. Let Y denote a random variable that has a beta distribution with 
parameters a and /3. Then 



da 



5^1og(X^m.F,,(y,) ] - JE[logy], 
j=i \i=i J 



and 



U V Eti ^iFiAy^) 1 - Ell wiF,,{y,) Eti y^ifijijj,: 

for i = 1, . . . , k — 1. The second derivatives are 

= -Jvar(log(r)), = -Jvar(log(l-r)), = - Jcov(log(r), log(l - F)) 

and 



da dw, ^ mF,,{y,) ' 5/3 9^. ^ 1 - ^f^^ 

for i = 1, . . . , k — 1, while 

J^i \KLi=iWiFij[yj)Y (1 - Ei=i^«^;i(%))V 

for ii = 1, . . . , k — 1 and 22 = 1, . . . , k — 1. The method of scoring now applies Newton's 
algorithm to optimize the likelihood as a function of the parameter vector. 
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